Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Automatic Language Processing and Computational Linguistics Over the past 10 years the government has spent, through various agencies, some $20 million on machine translation and closely related subjects (see Appendix 16~. This is more than the govern- ment cost of translation for 1 year. Other moneys have been allo- cated to information retrieval, library automation, and programmed instruction. Although techniques of machine construction and programming for time-shared operation have been developed with partial support from the government, the computer industry has spent its own resources in machine development, and expenditures in connection with automatic language processing have played a distinctly minor role in advances in computer hardware. Industry has also been responsible for the development of im- portant techniques of computer justification and hyphenation of newsprint and related matters of composition (see Appendix 17), perhaps because the market was easy to determine. As opposed to its small effect on computer hardware, work toward machine translation, together with the computational lin- guistic work that has grown out of it, has contributed significantly to computer software (programming techniques and systems). These contributions are discussed in considerable detail in Appendix 18. By far the most important outcome of work toward machine translation has been its effect on linguistics, which is described in more detail in Appendix 19. The advent of computational linguistics promises to work a revolution in the study of natural languages. A decade ago, most linguists believed that syntax had to do with word order, inflection, function words (e.g., prepositions and conjunctions), and intonation or punctuation. They also believed that most sentences uttered by native speakers in ordinary contexts were syntactically unambiguous. Today, they know that these two beliefs are mutually inconsistent. Their knowledge is the immediate result of computer parsing of 29
ordinary sentences, using reasonable grammars as hitherto con- ceived and programs that expose all ambiguities under a fixed grammar. Today there are linguistic theoreticians who take no interest in empirical studies or in computation. There are also empirical lin- guists who are not excited by the theoretical advances of the de- cadeor by computers. But more linguists than ever before are attempting to bring subtler theories into confrontation with richer bodies of data, and virtually all of them, in every country, are eager for computational support. The life's work of a generation ago (a concordance, a glossary, a superficial grammar) is the first small step of today, accomplished in a few weeks (next year, in a few days), the first of 10,000 steps toward an understanding of natural language as the vehicle of human communication. The revolution in linguistics has not been solely a result of attempts at machine translation and parsing, but it is unlikely that the revolution would have been extensive or significant without these attempts. We see that the computer has opened up to linguists a host of challenges, partial insights, and potentialities. We believe these can be aptly compared with the challenges, problems, and insights of particle physics. Certainly, language is second to no phenomenon in importance. And the tools of computational linguistics are con- siderably less costly than the multibillion-volt accelerators of particle physics. The new linguistics presents an attractive as well as an extremely important challenge. There is every reason to believe that facing up to this challenge will ultimately lead to important contributions in many fields. A deeper knowledge of language could help 1. to teach foreign languages more effectively; 2. to teach about the nature of language more effectively; 3. to use natural language more effectively in instruction and communication; 4. to enable us to engineer artificial languages for special purposes (e.g., pilot-to-control tower languages); 5. to enable us to make meaningful psychological experiments in language use and in human communication and thought (unless we know what language is we do not know what we must explain); and 6. to use machines as aids in translation and in information retrieval. However, the state of linguistics is such that excellent research, which has value in itself, is essential if linguistics is ultimately to make such contributions. 30
Such research must make use of computers. The data we must examine in order to find out about language is overwhelming both in quantity and in complexity. Computers give promise of helping us control the problems relating to the tremendous volume of data, and to a lesser extent the problems of data complexity. But, we do not yet have good, easily used, commonly known methods for having computers deal with language data. Therefore, among the important kinds of research that need to be done and should be supported are (1) basic developmental re- search in computer methods for handling language, as tools for the linguistic scientist to use as a help to discover and state his general- izations, and as tools to help check proposed generalizations against data; and (2) developmental research in methods to allow linguistic scientists to use computers to state in detail the complex kinds of theories (for example, grammars and theories of meaning) they produce, so that the theories can be checked in detail. 31