Skip to main content

Currently Skimming:

Linguistic and Machine Methods for Compiling and Updating the Harvard Automatic Dictionary
Pages 951-974

The Chapter Skim interface presents what we've algorithmically identified as the most significant single chunk of text within every page in the chapter.
Select key terms on the right to highlight them within pages of the chapter.


From page 951...
... Research workers with both adequate qualifications in linguistics and experience in the design and operation of automatic information processing mach~nes are relatively scarce. Careful planning is therefore essential in order to enable the performance of large scale routine tasks by a team of clerical and technical personnel assisted by automatic machines.
From page 952...
... An automatic dictionary in which a record is kept ofthe frequency of use of each entry can provide an accurate and current standard for eliminating the "noise" caused by words common in any text. The inverse inflection algorithm mentioned in Section 2, extended if desirable to account for derivation as well, can be useful in mapping inflectional variants of a stem, or derivatives of a root, into a single class or canonical form.
From page 953...
... In searching texts for the occurrence of key words or word combinations, the use of inverse inflection and derivation algorithms will permit specifying keys in a canonical form, with the guarantee that occurrences of inflected or derived variants wiD also be detected. As the translation algorithms based on the use of automatic dictionaries grow in sophistication, the ties between automatic language translation and the many areas where syntactic analysis and code conversions are necessary will very likely continue to be strengthened.
From page 954...
... It seems more efficient to carry a few doubtful words through routine compilation procedures and to provide for their automatic removal (based on a criterion of frequency of use) , once operating experience has accumulated, than to spend valuable personnel time on intricate and inconclusive selection procedures.
From page 955...
... The pure gold panning procedure by-passes these difficulties, but at the price of commitment to the system of distinct entries for distinct inflected forms. If experimental work is to be carried out on existing machines, facilities sufficiently ample and economical to store a large dictionary are presently available only in the form of magnetic tapes or punched cards.
From page 956...
... Our system of classification is based on the assumption that the identification of the inflectional pattern of a word must, for the time being at least, remain a manual function, while the actual generation of distinct inflected forms can be an almost completely automatic process. Therefore, ease and accuracy of identification must be promoted by using any readily obtainable data meaningful to a person, while the generation must be based strictly on explicit orthographic data recognizable by a machine.
From page 957...
... In a dictionary of canonical form stem entries, a spurious form usually leaves no traces, since it leads to a stem identical to those obtained from the other distinct inflected forms. This procedure is very much like one used quite frequently in approximating mathematical functions over a given range: any convenient function may be used that suitably approximates the desired function In the specified range; its behavior outside of this range is of no consequence.
From page 958...
... For example, in the class N4, the set of distinct inflected forms is described as consisting of the standard dictionary canonical form, plus several other forms generated by adding specified endings to a generating stem. The rule offormation for the generating stem itself is given in such terms as "canonical form minus last letter." The generating stems defined in the formation rules for distinct inflected forms are not necessarily identical with the stems that will be used as entries for the dictionary.
From page 959...
... In addition, significant examples, lists of exceptions, and so forth, have been given wherever possible. Our system also embraces a large number of words whose formation is "irregular." For example, the class N4.31 comprises words in which the vowel "o" is introduced in one inflected form.
From page 960...
... An inflectional class marker, assigned in the manner outlined in Section 3, was then written on each card. The card file is used only in the initial compilation process, since new words found in texts as a by-product of dictionary operation will be made available automatically in a
From page 961...
... OETTINGER et al. Updating the Harvard Automatic Dictionary 961 ADJECTIVES m arc m; EI r C ~ Act B~ Tic m r V ~ H (it)
From page 962...
... familiar layout, it does not preserve normal CyriDic alphabetic order, and alphabetization of words in the code of column 2 is impossible, Magnetic tapes obtained from the typewriter are therefore used as input to a code conversion r',n in which the typewriter code of column 2 is converted into the ranked code given in column 3. The correspondence between CyriDic characters and machine characters becomes that given between columns ~ and 3.
From page 963...
... OETTINGER et al. Updating the Harvard Automatic Dictionary 963 corded In the ranked code obviously cannot be easily read, so that material which must be read quickly is subjected to still another code conversion run in which the character strings given ~ column 4 of Fig.
From page 964...
... Because the criteria for ciassification are based largely on the configuration of the last letters of each word, words with the same class marker tend to be brought together on this list.
From page 965...
... This identification number accompanies all forms of a word throughout compilation, to facilitate the identification of Russian words represented in the ranked code, and the tracing of errors. Of the last three digits of the fifth machine word, the two low order ones specify the character position within a machine word at which the last letter of the Russian word occurs, while the high order digit specifies the machine word (O.
From page 966...
... In case of doubt they are retained, to be deleted when their zero frequency of use after a long period of operation automatically indicates that they should be.
From page 967...
... Words marked with nominal and verbal class markers are inflected before those with adjectival class markers. This means that, on the last run, those adjectival forms generated together with their class markers as a result of verb inflection can be inflected with the other adjectives.
From page 968...
... On the magnetic tapes used for further processing, only a stem is present in the first three machine words. The split ending is stored in the last five character positions of the fourth machine word of the item, where it may be seen in ranked code.
From page 969...
... 11. This layout is designed to guide the manual inscription of English correspondents and of grammatical coding associated with the stem canonical forms.
From page 970...
... Functionally distinct paradigms lumped into one inflectional class to simplify classification and automatic inflection are also distinguished by means of a notation in the organized word. For example, the distinction between animate and inanimate nouns is of no consequence so far as the generation of distinct inflected forms is concerned, but it is vital to the interpretation of the functional significance of endings.
From page 971...
... The first stem in such a set is always that split from the standard dictionary canonical form, and is marked by the letter F in the seventh column of the fourth machine word of the Russian item. Unless it is to be deleted, this stem is usually given a left-hand margin marker I
From page 972...
... The Design of New Systems AREA S FIGURE 12. Assembled `dictionary.
From page 973...
... W FOUST, "Inflected Form Generators, Design and Operation of Digital Calculating Machinery, Progress Report AF=9, Sec.


This material may be derived from roughly machine-read images, and so is provided only to facilitate research.
More information on Chapter Skim is available.